Skip to content

Conversation

@FMarno
Copy link
Owner

@FMarno FMarno commented Oct 17, 2024

No description provided.

alexfh and others added 30 commits October 9, 2024 14:15
This manifested as an assertion failure in Clang built against libc++
with
hardening enabled (e.g.
-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG):
`libcxx/include/__memory/unique_ptr.h:596: assertion
__checker_.__in_bounds(std::__to_address(__ptr_), __i) failed:
unique_ptr<T[]>::operator[](index): index out of range`
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.

Fixes llvm#111606.
This commit adds the ViewLikeOpInterface to the GEP and AddrSpaceCast
operations. This allows us to simplify the inliner interface. At the
same time, the change also makes the inliner interface more extensible
for downstream users that have custom view-like operations.
…aders for LinalgDialect (llvm#111603)

This fixes non-deterministic build failures.

Fixes llvm#111527

---------

Co-authored-by: zecheng.zhang <[email protected]>
Co-authored-by: Mehdi Amini <[email protected]>
llvm#111451)

We add static methods to APFloatBase to allow the hasZero and
hasSignedRepr properties of fltSemantics to be obtained.
)

On Apple platforms, using system-libcxxabi as an ABI library wouldn't
work because we'd try to re-export symbols from libc++abi that the
system libc++abi.dylib might not have. Instead, only re-export those
symbols when we're using the in-tree libc++abi.

This does mean that libc++.dylib won't re-export any libc++abi symbols
when building against the system libc++abi, which could be fixed in
various ways. However, the best solution really depends on the intended
use case, so this patch doesn't try to solve that problem.

As a drive-by, also improve the diagnostic message when the user forgets
to set the LIBCXX_CXX_ABI_INCLUDE_PATHS variable, which would previously
lead to a confusing error.

Closes llvm#104672
…11533)

This patch implements speculation for
vector.transfer_read/vector.transfer_write ops, allowing these ops to
work with LICM.
Add the permutation clause for the interchange directive which will be
introduced in the upcoming OpenMP 6.0 specification. A preview has been
published in
[Technical Report12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
…ls. (llvm#109708)

Make legacy cost retrieval independent of getInstructionForCost by
sinking it to more specific ::computeCost implementation (specifically
VPInterleaveRecipe::computeCost and VPSingleDefRecipe::computeCost).

Inline getInstructionForCost to VPRecipeBase::cost(), as it is now only
used to decide which recipes to skip during cost computation and when to
apply forced costs.

PR: llvm#109708
…t(pcmpeq(and(X,Pow2),Pow2),B,A)

Matches what we already do in LowerVSETCC to reuse an existing constant

Fixes llvm#110875
…lvm#111600)

The `SymbolTableListTraits` template is explicitly instantiated for the
following types:
  * `llvm/lib/IR/Function.cpp`
    - `BasicBlock`
  * `llvm/lib/IR/Module.cpp`
    - `Function`
    - `GlobalAlias`
    - `GlobalIFunc`
    - `GlobalVariable`

When LLVM is built on Windows with the `LLVM_EXPORT_SYMBOLS_FOR_PLUGINS`
option enabled, the implicit instantiation of the template prevents the
`SymbolTableListTraits` template from being exported. This causes link
errors when the template or IR API is used in a plugin.

This change prevents the template being implicitly instantiated for
these types.
…1540)

Currently this test is completely xfailed as part of the patch
llvm#106077. But this test works on
A and R profile, not in v7M profile. Because the test contain cases in
which m-profile will fail for atomic types greater than 4 bytes in size.
This was using addrspace 0 and 1 pointers interchangably. This works
out since they happen to use the same size, but consistently query
or use the correct one.
…#111541 into a canonicalization (llvm#111614)

This is a reasonable canonicalization because `extract` is more
constrained than `extract_strided_slices`, so there is no loss of
semantics here, just lifting an op to a special-case higher/constrained
op. And the additional `shape_cast` is merely adding leading unit dims
to match the original result type.

Context: discussion on llvm#111541. I wasn't sure how this would turn out,
but in the process of writing this PR, I discovered at least 2 bugs in
the pattern introduced in llvm#111541, which shows the value of shared
canonicalization patterns which are exercised on a high number of
testcases.

---------

Signed-off-by: Benoit Jacob <[email protected]>
With some restrictions, BIND(C) derived types can be converted to
compatible BIND(C) derived types.
Semantics already support this, but ConvertOp was missing the
conversion of such types.

Fixes llvm#107783
These should be well behaved address computations.
…oops (llvm#111656)

Properly handles `cycle` branching inside target distribute loops.
goldsteinn and others added 24 commits October 10, 2024 01:07
Same logic as other callsites, if the attributes are intersectable, we
merge.

Closes llvm#111713
…#111759)

These were split in 0e8208e, with the
only functional difference between them at the time being `--prepend_env
PATH=%{lib-dir}` in the static config and `--prepend_env
PATH=%{install-prefix}/bin` in the shared library config.

However this difference is unnecessary - the static library config
doesn't need any `--prepend_env` argument at all. Before
0e8208e, both configurations used the
same config file, where the `--prepend_env` argument was unnecessary but
benign in the static case.

Reduce the unnecessary config duplication in this case, and return these
configs to using one single config file for both setups.
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use
them to canonicalize a floating point number. And
FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding
FMINIMUMNUM/FMAXIMUMNUM, so let's define them.

Update combine_andor_with_cmps.ll.
Add fp-maximumnum-minimumnum.ll, with nnan testcases only.

V1F64 is not supported yet.
If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem:
   both of them use `if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))`.

AArch64 depends on `expandFMINNUM_FMAXNUM` returning `SDValue()`
for FMAXNUM and FMINNUM.

We should fix this problem, while it will be in future patch.
This finishes the clang implementation of P0522, getting rid of the
fallback to the old, pre-P0522 rules.

Before this patch, when partial ordering template template parameters,
we would perform, in order:
* If the old rules would match, we would accept it. Otherwise, don't
generate diagnostics yet.
* If the new rules would match, just accept it. Otherwise, don't
generate any diagnostics yet again.
* Apply the old rules again, this time with diagnostics.

This situation was far from ideal, as we would sometimes:
* Accept some things we shouldn't.
* Reject some things we shouldn't.
* Only diagnose rejection in terms of the old rules.

With this patch, we apply the P0522 rules throughout.

This needed to extend template argument deduction in order to accept the
historial rule for TTP matching pack parameter to non-pack arguments.
This change also makes us accept some combinations of historical and
P0522 allowances we wouldn't before.

It also fixes a bunch of bugs that were documented in the test suite,
which I am not sure there are issues already created for them.

This causes a lot of changes to the way these failures are diagnosed,
with related test suite churn.

The problem here is that the old rules were very simple and
non-recursive, making it easy to provide customized diagnostics, and to
keep them consistent with each other.

The new rules are a lot more complex and rely on template argument
deduction, substitutions, and they are recursive.

The approach taken here is to mostly rely on existing diagnostics, and
create a new instantiation context that keeps track of this context.

So for example when a substitution failure occurs, we use the error
produced there unmodified, and just attach notes to it explaining that
it occurred in the context of partial ordering this template argument
against that template parameter.

This diverges from the old diagnostics, which would lead with an error
pointing to the template argument, explain the problem in subsequent
notes, and produce a final note pointing to the parameter.
Summary:
Option `-fskip-odr-check-in-gmf` is set by default and I think it is
what most of C++ developers want. But in header units, Clang ODR
checking is too strict, making them hard to use, as seen in the example
in the diff. This diff relaxes ODR checks for unnamed modules to match
GMF ODR checking.

Test Plan: check-clang
…07350)

With this change, we discriminate if the primary template and which
partial specializations would have participated in overload resolution
prior to P0522 changes.

We collect those in an initial set. If this set is not empty, or the
primary template would have matched, we proceed with this set as the
candidates for overload resolution.

Otherwise, we build a new overload set with everything else, and proceed
as usual.
…te calls. (llvm#111457)

Clang previously missed implementing P0522 pack matching for deduced
function template calls.

Fixes llvm#111363
Extra builders for CallIntrinsicOp.
This is inspired by the comment from @antiagainst from
[here](llvm#108933 (comment)).
…11679)

This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
These might also be called with vectors, but we don't support that.
This does a global rename from `flang-new` to `flang`. I also
removed/changed any TODOs that I found related to making this change.

---------

Co-authored-by: H. Vetinari <[email protected]>
Co-authored-by: Andrzej Warzynski <[email protected]>
llvm#111797)

This commit fixes a bug in the import of nameless globals. Before this
change, the fake symbol names were only generated during the
transformation of the definition. This caused issues when the symbol was
used before it was defined.
…e. (llvm#111428)

Similar to 112aac4, this converts log
libcalls to llvm.log.f64 intrinsics if we know they do not set errno, as
the input is not zero and not negative. As log will produce errno if the
input is 0 (returning -inf) or if the input is negative (returning nan),
we also perform the conversion when we have noinf and nonan.
follow up work of llvm#106229, add
create pass overload function to create pass.

---------

Co-authored-by: jingzec <[email protected]>
- Add handling for unsigned integers to hlsl_elementwise_sign
- Use `select` instead of adding dx and spirv intrinsics for unsigned
integers as [discussed previously
](llvm#101988 (comment))

fixes llvm#70078

### Related PRs
- llvm#101987
- llvm#101988
- llvm#101989

cc @farzonl @pow2clk @bob80905 @bogner @llvm-beanz
Saves me searching for this every time someone asks.
…XTRACT_SUBVECTOR(V,C1+C2) (llvm#111685)

Extract from the original source vector whenever possible.

This removes a number of dependency bottlenecks and helps a number of shuffle combining cases: either by allowing us to avoid a cross-lane variable shuffle on a slow target by keeping the instruction count below the threshold, or on fast targets make it easier to recognise that the subvectors all came form the same source.
…lvm#111720)

Fixes missing m0 initialize for pre-gfx9 targets with local extending
loads.
Implement the addMachineSSAOptimizations passes for AMDGPU. Porting
the other generic passes in this category is WIP.
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more
constants to be propagated. We also run PostOrderFunctionAttrs to
improve the information available to ArgumentPromotion's alias analysis,
and SROA to clean up allocas.

static std::optional<uint32_t>
getIntelReqdSubGroupSize(FunctionOpInterface func) {
constexpr llvm::StringLiteral discardableIntelReqdSubgroupSize =
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we could get this from a function like a IntelReqdSubgroupSizeAttrName function

// CHECK-SAME: %[[I32_VAL:.*]]: i32, %[[I64_VAL:.*]]: i64,
// CHECK-SAME: %[[F16_VAL:.*]]: f16, %[[F32_VAL:.*]]: f32,
// CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32) {
// CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32)
// CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32) attributes {gpu.known_subgroup_size = 16 : i32} {

include the attribute in the check

Also use it for lowering in GPUToLLVMSPV
@FMarno FMarno force-pushed the gpu_known_subgroup_size branch from 65542dd to 31fd327 Compare October 22, 2024 09:48
FMarno pushed a commit that referenced this pull request Oct 22, 2024
…en issue

Since llvm#109628 landed, this test
has been failing on 32-bit Arm.

This is due to a codegen problem (whether added or uncovered by the change,
not known) where the trap instruction is placed after the frame pointer
and link register are restored.

llvm#113154

So the code was:
```
std::__1::vector<int>::operator[](unsigned int):
  sub sp, sp, llvm#8
  str r0, [sp, #4]
  str r1, [sp]
  add sp, sp, llvm#8
  .inst 0xe7ffdefe
  bx lr
```
When lldb saw the trap, the PC was inside operator[] but the frame
information actually pointed to g.

This bug only happens for leaf functions so adding a return type
works around it:
```
std::__1::vector<int>::operator[](unsigned int):
  push {r11, lr}
  mov r11, sp
  sub sp, sp, llvm#8
  str r0, [sp, #4]
  str r1, [sp]
  mov sp, r11
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr
```
(and operator[] should return T& anyway)

Now the PC location and frame information should match and the
test passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.